Approximately Minwise Independence with Twisted Tabulation
نویسندگان
چکیده
A random hash function h is ε-minwise if for any set S, |S| “ n, and element x P S, Prrhpxq “ minhpSqs “ p1 ̆ εq{n. Minwise hash functions with low bias ε have widespread applications within similarity estimation. Hashing from a universe rus, the twisted tabulation hashing of Pǎtraşcu and Thorup [SODA’13] makes c “ Op1q lookups in tables of size u1{c. Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields Õp1{u1{cq-minwise hashing. In the classic independence paradigm of Wegman and Carter [FOCS’79] Õp1{u1{cq-minwise hashing requires Ωplog uq-independence [Indyk SODA’99]. Pǎtraşcu and Thorup [STOC’11] had shown that simple tabulation, using same space and lookups yields Õp1{n1{cq-minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.
منابع مشابه
Exponential Space Improvement for minwise Based Algorithms
In this paper we introduce a general framework that exponentially improves the space, the degree of independence, and the time needed by min-wise based algorithms. The authors, in SODA11, [15] introduced an exponential time improvement for min-wise based algorithms by defining and constructing an almost k-min-wise independent family of hash functions. Here we develop an alternative approach tha...
متن کاملIndependence of Tabulation-Based Hash Classes
A tabulation-based hash function maps a key into d derived characters indexing random values in tables that are then combined with bitwise xor operations to give the hash. Thorup and Zhang [8] presented d-wise independent tabulation-based hash classes that use linear maps over finite fields to map a key, considered as a vector (a, b), to derived characters. We show that a variant where the deri...
متن کاملOptimal Densification for Fast and Accurate Minwise Hashing
Minwise hashing is a fundamental and one of the most successful hashing algorithm in the literature. Recent advances based on the idea of densification (Shrivastava & Li, 2014a;c) have shown that it is possible to compute k minwise hashes, of a vector with d nonzeros, in mere (d + k) computations, a significant improvement over the classical O(dk). These advances have led to an algorithmic impr...
متن کاملb-Bit Minwise Hashing in Practice: Large-Scale Batch and Online Learning and Using GPUs for Fast Preprocessing with Simple Hash Functions
ABSTRACT Minwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [27] demonstrated a potential use of b-bit minwise hashing [26] for batch learning on large data. However, several critical issues must be tackled before one can apply b-bit minwise hashing to the volumes of data often used industrial applications, especially in the cont...
متن کاملb-Bit Minwise Hashing for Large-Scale Learning
Abstract Minwise hashing is a standard technique in the context of search for efficiently computing set similarities. The recent development of b-bit minwise hashing provides a substantial improvement by storing only the lowest b bits of each hashed value. In this paper, we demonstrate that b-bit minwise hashing can be naturally integrated with linear learning algorithms such as linear SVM and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014